Overview

Dataset statistics

Number of variables20
Number of observations338592
Missing cells177435
Missing cells (%)2.6%
Duplicate rows334559
Duplicate rows (%)98.8%
Total size in memory51.7 MiB
Average record size in memory160.0 B

Variable types

Categorical3
Numeric17

Warnings

Dataset has 334559 (98.8%) duplicate rows Duplicates
Honsyokin1 is highly correlated with Honsyokin2 and 3 other fieldsHigh correlation
Honsyokin2 is highly correlated with Honsyokin1 and 3 other fieldsHigh correlation
Honsyokin3 is highly correlated with Honsyokin1 and 3 other fieldsHigh correlation
Honsyokin4 is highly correlated with Honsyokin1 and 3 other fieldsHigh correlation
Honsyokin5 is highly correlated with Honsyokin1 and 3 other fieldsHigh correlation
Fukasyokin1 is highly correlated with Fukasyokin2 and 1 other fieldsHigh correlation
Fukasyokin2 is highly correlated with Fukasyokin1 and 1 other fieldsHigh correlation
Fukasyokin3 is highly correlated with Fukasyokin1 and 1 other fieldsHigh correlation
FukasyokinBefore1 is highly correlated with FukasyokinBefore2High correlation
FukasyokinBefore2 is highly correlated with FukasyokinBefore1High correlation
Fukasyokin5 is highly correlated with CourseKubunCDHigh correlation
CourseKubunCD is highly correlated with Fukasyokin5High correlation
CourseKubunCD has 177435 (52.4%) missing values Missing
Honsyokin6 is highly skewed (γ1 = 31.61072505) Skewed
HonsyokinBefore1 is highly skewed (γ1 = 86.10465943) Skewed
HonsyokinBefore2 is highly skewed (γ1 = 55.706363) Skewed
HonsyokinBefore3 is highly skewed (γ1 = 30.79988175) Skewed
HonsyokinBefore4 is highly skewed (γ1 = 27.82624995) Skewed
HonsyokinBefore5 is highly skewed (γ1 = 26.26044884) Skewed
Fukasyokin1 is highly skewed (γ1 = 20.31334079) Skewed
Fukasyokin2 is highly skewed (γ1 = 22.48595087) Skewed
Fukasyokin3 is highly skewed (γ1 = 20.27179471) Skewed
Fukasyokin4 is highly skewed (γ1 = 40.69144719) Skewed
FukasyokinBefore1 is highly skewed (γ1 = 140.505371) Skewed
FukasyokinBefore2 is highly skewed (γ1 = 140.0771627) Skewed
Honsyokin6 has 337671 (99.7%) zeros Zeros
HonsyokinBefore1 has 338206 (99.9%) zeros Zeros
HonsyokinBefore2 has 337507 (99.7%) zeros Zeros
HonsyokinBefore3 has 336969 (99.5%) zeros Zeros
HonsyokinBefore4 has 336656 (99.4%) zeros Zeros
HonsyokinBefore5 has 336622 (99.4%) zeros Zeros
Fukasyokin1 has 250525 (74.0%) zeros Zeros
Fukasyokin2 has 250525 (74.0%) zeros Zeros
Fukasyokin3 has 250525 (74.0%) zeros Zeros
Fukasyokin4 has 338278 (99.9%) zeros Zeros
FukasyokinBefore1 has 338398 (99.9%) zeros Zeros
FukasyokinBefore2 has 338214 (99.9%) zeros Zeros

Reproduction

Analysis started2021-04-07 13:02:42.543316
Analysis finished2021-04-07 13:04:30.868375
Duration1 minute and 48.33 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

TrackCDBefore
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
0
338580 
17
 
12

Length

Max length2
Median length1
Mean length1.000035441
Min length1

Characters and Unicode

Total characters338604
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0338580
> 99.9%
1712
 
< 0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0338580
> 99.9%
1712
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0338580
> 99.9%
112
 
< 0.1%
712
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338604
100.0%

Most frequent character per category

ValueCountFrequency (%)
0338580
> 99.9%
112
 
< 0.1%
712
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common338604
100.0%

Most frequent character per script

ValueCountFrequency (%)
0338580
> 99.9%
112
 
< 0.1%
712
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII338604
100.0%

Most frequent character per block

ValueCountFrequency (%)
0338580
> 99.9%
112
 
< 0.1%
712
 
< 0.1%

CourseKubunCD
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing177435
Missing (%)52.4%
Memory size2.6 MiB
A
76536 
B
51928 
C
23987 
D
8706 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters161157
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowB
3rd rowA
4th rowB
5th rowA
ValueCountFrequency (%)
A76536
22.6%
B51928
 
15.3%
C23987
 
7.1%
D8706
 
2.6%
(Missing)177435
52.4%
Histogram of lengths of the category
ValueCountFrequency (%)
a76536
47.5%
b51928
32.2%
c23987
 
14.9%
d8706
 
5.4%

Most occurring characters

ValueCountFrequency (%)
A76536
47.5%
B51928
32.2%
C23987
 
14.9%
D8706
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter161157
100.0%

Most frequent character per category

ValueCountFrequency (%)
A76536
47.5%
B51928
32.2%
C23987
 
14.9%
D8706
 
5.4%

Most occurring scripts

ValueCountFrequency (%)
Latin161157
100.0%

Most frequent character per script

ValueCountFrequency (%)
A76536
47.5%
B51928
32.2%
C23987
 
14.9%
D8706
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII161157
100.0%

Most frequent character per block

ValueCountFrequency (%)
A76536
47.5%
B51928
32.2%
C23987
 
14.9%
D8706
 
5.4%

Honsyokin1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct141
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean109045.3422
Minimum35000
Maximum3000000
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum35000
5-th percentile50000
Q150000
median72000
Q3105000
95-th percentile230000
Maximum3000000
Range2965000
Interquartile range (IQR)55000

Descriptive statistics

Standard deviation146874.2518
Coefficient of variation (CV)1.346909908
Kurtosis111.1193109
Mean109045.3422
Median Absolute Deviation (MAD)22000
Skewness8.45679402
Sum3.69218805 × 1010
Variance2.157204584 × 1010
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5000088949
26.3%
7500033512
 
9.9%
7000032075
 
9.5%
15000014640
 
4.3%
5100014251
 
4.2%
7200013523
 
4.0%
10500011910
 
3.5%
18200011239
 
3.3%
760008035
 
2.4%
740007816
 
2.3%
Other values (131)102642
30.3%
ValueCountFrequency (%)
3500078
 
< 0.1%
3550015
 
< 0.1%
400001224
0.4%
42000342
 
0.1%
45000343
 
0.1%
ValueCountFrequency (%)
3000000141
< 0.1%
250000086
 
< 0.1%
2000000144
< 0.1%
180000013
 
< 0.1%
1500000266
0.1%

Honsyokin2
Real number (ℝ≥0)

HIGH CORRELATION

Distinct116
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43674.33519
Minimum15000
Maximum1200000
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum15000
5-th percentile20000
Q120000
median29000
Q342000
95-th percentile96000
Maximum1200000
Range1185000
Interquartile range (IQR)22000

Descriptive statistics

Standard deviation59008.41008
Coefficient of variation (CV)1.351100362
Kurtosis109.9323862
Mean43674.33519
Median Absolute Deviation (MAD)9000
Skewness8.426464182
Sum1.47877805 × 1010
Variance3481992460
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20000102993
30.4%
3000049225
14.5%
2800034863
 
10.3%
2900015768
 
4.7%
6000014601
 
4.3%
4200014040
 
4.1%
7300011216
 
3.3%
180007311
 
2.2%
400006187
 
1.8%
240005969
 
1.8%
Other values (106)76419
22.6%
ValueCountFrequency (%)
1500014
 
< 0.1%
160001224
 
0.4%
16500207
 
0.1%
17000331
 
0.1%
180007311
2.2%
ValueCountFrequency (%)
1200000141
< 0.1%
100000086
< 0.1%
800000144
< 0.1%
72000013
 
< 0.1%
68000017
 
< 0.1%

Honsyokin3
Real number (ℝ≥0)

HIGH CORRELATION

Distinct107
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27608.88101
Minimum9450
Maximum750000
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum9450
5-th percentile13000
Q113000
median18000
Q326000
95-th percentile58000
Maximum750000
Range740550
Interquartile range (IQR)13000

Descriptive statistics

Standard deviation36855.81516
Coefficient of variation (CV)1.334926075
Kurtosis110.4754713
Mean27608.88101
Median Absolute Deviation (MAD)5000
Skewness8.44937145
Sum9348146240
Variance1358351111
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13000102844
30.4%
1800050566
14.9%
1900049114
14.5%
2600015570
 
4.6%
3800014583
 
4.3%
4600011204
 
3.3%
1200010271
 
3.0%
250006561
 
1.9%
150006140
 
1.8%
370005370
 
1.6%
Other values (97)66369
19.6%
ValueCountFrequency (%)
945030
 
< 0.1%
100001224
0.4%
10250166
 
< 0.1%
1035076
 
< 0.1%
11000674
0.2%
ValueCountFrequency (%)
750000141
< 0.1%
63000086
 
< 0.1%
500000144
< 0.1%
45000013
 
< 0.1%
380000266
0.1%

Honsyokin4
Real number (ℝ≥0)

HIGH CORRELATION

Distinct120
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16466.26926
Minimum6000
Maximum450000
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum6000
5-th percentile7500
Q17500
median11000
Q316000
95-th percentile35000
Maximum450000
Range444000
Interquartile range (IQR)8500

Descriptive statistics

Standard deviation22212.11398
Coefficient of variation (CV)1.34894636
Kurtosis109.6451671
Mean16466.26926
Median Absolute Deviation (MAD)3500
Skewness8.422811245
Sum5575347040
Variance493378007.3
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1100099654
29.4%
750088655
26.2%
2700016133
 
4.8%
2300014632
 
4.3%
770014176
 
4.2%
1600014043
 
4.1%
1500011410
 
3.4%
220007289
 
2.2%
69006952
 
2.1%
90005989
 
1.8%
Other values (110)59659
17.6%
ValueCountFrequency (%)
60001252
0.4%
6250219
 
0.1%
6300331
 
0.1%
640014
 
< 0.1%
6800343
 
0.1%
ValueCountFrequency (%)
450000141
< 0.1%
38000086
 
< 0.1%
300000144
< 0.1%
27000013
 
< 0.1%
230000266
0.1%

Honsyokin5
Real number (ℝ≥0)

HIGH CORRELATION

Distinct171
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10906.22132
Minimum0
Maximum300000
Zeros6
Zeros (%)< 0.1%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile5000
Q15000
median7200
Q310500
95-th percentile23000
Maximum300000
Range300000
Interquartile range (IQR)5500

Descriptive statistics

Standard deviation14706.25771
Coefficient of variation (CV)1.348428321
Kurtosis110.6004489
Mean10906.22132
Median Absolute Deviation (MAD)2200
Skewness8.436496863
Sum3692759290
Variance216274015.7
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500088624
26.2%
750033383
 
9.9%
700032035
 
9.5%
1500014588
 
4.3%
510014237
 
4.2%
720013458
 
4.0%
1050011845
 
3.5%
1820011145
 
3.3%
76007974
 
2.4%
74007802
 
2.3%
Other values (161)103501
30.6%
ValueCountFrequency (%)
06
 
< 0.1%
200013
 
< 0.1%
230013
 
< 0.1%
240026
 
< 0.1%
2500206
0.1%
ValueCountFrequency (%)
300000141
< 0.1%
25000086
 
< 0.1%
200000144
< 0.1%
18000013
 
< 0.1%
150000266
0.1%

Honsyokin6
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct26
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.87614002
Minimum0
Maximum19500
Zeros337671
Zeros (%)99.7%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum19500
Range19500
Interquartile range (IQR)0

Descriptive statistics

Standard deviation317.3554522
Coefficient of variation (CV)22.87058591
Kurtosis1268.993624
Mean13.87614002
Median Absolute Deviation (MAD)0
Skewness31.61072505
Sum4698350
Variance100714.4831
MonotocityNot monotonic
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
0337671
99.7%
2500206
 
0.1%
3750138
 
< 0.1%
910078
 
< 0.1%
350057
 
< 0.1%
525048
 
< 0.1%
750043
 
< 0.1%
740042
 
< 0.1%
360038
 
< 0.1%
1050035
 
< 0.1%
Other values (16)236
 
0.1%
ValueCountFrequency (%)
0337671
99.7%
200013
 
< 0.1%
230013
 
< 0.1%
240026
 
< 0.1%
2500206
 
0.1%
ValueCountFrequency (%)
1950015
 
< 0.1%
1450014
 
< 0.1%
1150012
 
< 0.1%
1050035
< 0.1%
910078
< 0.1%

HonsyokinBefore1
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean181.0408988
Minimum0
Maximum970000
Zeros338206
Zeros (%)99.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum970000
Range970000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8581.544188
Coefficient of variation (CV)47.40113558
Kurtosis8880.791198
Mean181.0408988
Median Absolute Deviation (MAD)0
Skewness86.10465943
Sum61299000
Variance73642900.65
MonotocityNot monotonic
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
0338206
99.9%
5000078
 
< 0.1%
7500045
 
< 0.1%
15000040
 
< 0.1%
18200029
 
< 0.1%
14800027
 
< 0.1%
7000024
 
< 0.1%
97000017
 
< 0.1%
52000016
 
< 0.1%
14000016
 
< 0.1%
Other values (8)94
 
< 0.1%
ValueCountFrequency (%)
0338206
99.9%
5000078
 
< 0.1%
5100015
 
< 0.1%
6000011
 
< 0.1%
7000024
 
< 0.1%
ValueCountFrequency (%)
97000017
< 0.1%
52000016
< 0.1%
2100007
 
< 0.1%
18200029
< 0.1%
17800014
< 0.1%

HonsyokinBefore2
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct25
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean158.172668
Minimum0
Maximum390000
Zeros337507
Zeros (%)99.7%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum390000
Range390000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4201.197954
Coefficient of variation (CV)26.56083385
Kurtosis4284.417466
Mean158.172668
Median Absolute Deviation (MAD)0
Skewness55.706363
Sum53556000
Variance17650064.25
MonotocityNot monotonic
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
0337507
99.7%
20000300
 
0.1%
30000177
 
0.1%
6000079
 
< 0.1%
4200074
 
< 0.1%
2800066
 
< 0.1%
2900061
 
< 0.1%
7300052
 
< 0.1%
5900036
 
< 0.1%
14000027
 
< 0.1%
Other values (15)213
 
0.1%
ValueCountFrequency (%)
0337507
99.7%
1800014
 
< 0.1%
20000300
 
0.1%
2400011
 
< 0.1%
2800066
 
< 0.1%
ValueCountFrequency (%)
39000017
< 0.1%
21000016
< 0.1%
16000011
< 0.1%
15000012
< 0.1%
14000027
< 0.1%

HonsyokinBefore3
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct28
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean133.5117191
Minimum0
Maximum160000
Zeros336969
Zeros (%)99.5%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum160000
Range160000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2508.978045
Coefficient of variation (CV)18.79219339
Kurtosis1274.516322
Mean133.5117191
Median Absolute Deviation (MAD)0
Skewness30.79988175
Sum45206000
Variance6294970.833
MonotocityNot monotonic
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
0336969
99.5%
13000449
 
0.1%
19000288
 
0.1%
18000192
 
0.1%
3800097
 
< 0.1%
2500066
 
< 0.1%
4600064
 
< 0.1%
2400051
 
< 0.1%
2600045
 
< 0.1%
1200044
 
< 0.1%
Other values (18)327
 
0.1%
ValueCountFrequency (%)
0336969
99.5%
1200044
 
< 0.1%
13000449
 
0.1%
18000192
 
0.1%
19000288
 
0.1%
ValueCountFrequency (%)
16000012
< 0.1%
11000014
< 0.1%
10000011
< 0.1%
9800014
< 0.1%
9500012
< 0.1%

HonsyokinBefore4
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean104.2969119
Minimum0
Maximum93000
Zeros336656
Zeros (%)99.4%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum93000
Range93000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1831.11213
Coefficient of variation (CV)17.55672432
Kurtosis978.8591497
Mean104.2969119
Median Absolute Deviation (MAD)0
Skewness27.82624995
Sum35314100
Variance3352971.634
MonotocityNot monotonic
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
0336656
99.4%
11000506
 
0.1%
7500385
 
0.1%
15000153
 
< 0.1%
27000133
 
< 0.1%
23000115
 
< 0.1%
770090
 
< 0.1%
1600070
 
< 0.1%
5900052
 
< 0.1%
2600052
 
< 0.1%
Other values (17)380
 
0.1%
ValueCountFrequency (%)
0336656
99.4%
690030
 
< 0.1%
720028
 
< 0.1%
7500385
 
0.1%
770090
 
< 0.1%
ValueCountFrequency (%)
9300012
< 0.1%
7800025
< 0.1%
7700012
< 0.1%
6500014
< 0.1%
6200016
< 0.1%

HonsyokinBefore5
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct43
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean67.07305548
Minimum0
Maximum52000
Zeros336622
Zeros (%)99.4%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum52000
Range52000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1129.403666
Coefficient of variation (CV)16.83841086
Kurtosis890.9814597
Mean67.07305548
Median Absolute Deviation (MAD)0
Skewness26.26044884
Sum22710400
Variance1275552.64
MonotocityNot monotonic
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
0336622
99.4%
5000425
 
0.1%
7500230
 
0.1%
18200134
 
< 0.1%
1500092
 
< 0.1%
700079
 
< 0.1%
1050071
 
< 0.1%
760069
 
< 0.1%
720065
 
< 0.1%
1100055
 
< 0.1%
Other values (33)750
 
0.2%
ValueCountFrequency (%)
0336622
99.4%
400013
 
< 0.1%
460013
 
< 0.1%
480054
 
< 0.1%
5000425
 
0.1%
ValueCountFrequency (%)
5200025
< 0.1%
5100012
 
< 0.1%
4100016
 
< 0.1%
3900053
< 0.1%
2900014
 
< 0.1%

Fukasyokin1
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct222
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1856.040958
Minimum0
Maximum411110
Zeros250525
Zeros (%)74.0%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32100
95-th percentile4690
Maximum411110
Range411110
Interquartile range (IQR)2100

Descriptive statistics

Standard deviation13838.30814
Coefficient of variation (CV)7.455820455
Kurtosis445.5434246
Mean1856.040958
Median Absolute Deviation (MAD)0
Skewness20.31334079
Sum628440620
Variance191498772.1
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0250525
74.0%
35004100
 
1.2%
34303909
 
1.2%
35703824
 
1.1%
36403465
 
1.0%
33603133
 
0.9%
38502928
 
0.9%
37102812
 
0.8%
37802693
 
0.8%
39202311
 
0.7%
Other values (212)58892
 
17.4%
ValueCountFrequency (%)
0250525
74.0%
11205
 
< 0.1%
119012
 
< 0.1%
126017
 
< 0.1%
133022
 
< 0.1%
ValueCountFrequency (%)
41111017
< 0.1%
39032015
< 0.1%
37807014
< 0.1%
37310014
< 0.1%
36484013
< 0.1%

Fukasyokin2
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct222
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean538.34432
Minimum0
Maximum199350
Zeros250525
Zeros (%)74.0%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3600
95-th percentile1340
Maximum199350
Range199350
Interquartile range (IQR)600

Descriptive statistics

Standard deviation4178.689778
Coefficient of variation (CV)7.76211362
Kurtosis611.9730583
Mean538.34432
Median Absolute Deviation (MAD)0
Skewness22.48595087
Sum182279080
Variance17461448.26
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0250525
74.0%
10004100
 
1.2%
9803894
 
1.2%
10203814
 
1.1%
10403438
 
1.0%
9603120
 
0.9%
11002928
 
0.9%
10602788
 
0.8%
10802704
 
0.8%
11202311
 
0.7%
Other values (212)58970
 
17.4%
ValueCountFrequency (%)
0250525
74.0%
3205
 
< 0.1%
34012
 
< 0.1%
36017
 
< 0.1%
38022
 
< 0.1%
ValueCountFrequency (%)
19935017
< 0.1%
11746017
< 0.1%
11152015
< 0.1%
10802014
< 0.1%
10660014
< 0.1%

Fukasyokin3
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct216
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean265.9456809
Minimum0
Maximum58730
Zeros250525
Zeros (%)74.0%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3300
95-th percentile670
Maximum58730
Range58730
Interquartile range (IQR)300

Descriptive statistics

Standard deviation1991.399779
Coefficient of variation (CV)7.487994435
Kurtosis442.5715978
Mean265.9456809
Median Absolute Deviation (MAD)0
Skewness20.27179471
Sum90047080
Variance3965673.079
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0250525
74.0%
5004087
 
1.2%
4903881
 
1.1%
5103849
 
1.1%
5203452
 
1.0%
4803107
 
0.9%
5502938
 
0.9%
5302777
 
0.8%
5402725
 
0.8%
5602293
 
0.7%
Other values (206)58958
 
17.4%
ValueCountFrequency (%)
0250525
74.0%
1305
 
< 0.1%
16022
 
< 0.1%
17025
 
< 0.1%
18017
 
< 0.1%
ValueCountFrequency (%)
5873017
< 0.1%
5576015
< 0.1%
5401014
< 0.1%
5330014
< 0.1%
5212013
< 0.1%

Fukasyokin4
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2750803327
Minimum0
Maximum660
Zeros338278
Zeros (%)99.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum660
Range660
Interquartile range (IQR)0

Descriptive statistics

Standard deviation9.699870875
Coefficient of variation (CV)35.26195705
Kurtosis1883.12259
Mean0.2750803327
Median Absolute Deviation (MAD)0
Skewness40.69144719
Sum93140
Variance94.08749499
MonotocityNot monotonic
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
0338278
99.9%
24042
 
< 0.1%
28034
 
< 0.1%
27023
 
< 0.1%
19020
 
< 0.1%
21020
 
< 0.1%
16017
 
< 0.1%
45016
 
< 0.1%
33015
 
< 0.1%
44014
 
< 0.1%
Other values (10)113
 
< 0.1%
ValueCountFrequency (%)
0338278
99.9%
1305
 
< 0.1%
16017
 
< 0.1%
17013
 
< 0.1%
19020
 
< 0.1%
ValueCountFrequency (%)
66012
< 0.1%
46014
< 0.1%
45016
< 0.1%
44014
< 0.1%
42012
< 0.1%

Fukasyokin5
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
0
338579 
170
 
13

Length

Max length3
Median length1
Mean length1.000076789
Min length1

Characters and Unicode

Total characters338618
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0338579
> 99.9%
17013
 
< 0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0338579
> 99.9%
17013
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0338592
> 99.9%
113
 
< 0.1%
713
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338618
100.0%

Most frequent character per category

ValueCountFrequency (%)
0338592
> 99.9%
113
 
< 0.1%
713
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common338618
100.0%

Most frequent character per script

ValueCountFrequency (%)
0338592
> 99.9%
113
 
< 0.1%
713
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII338618
100.0%

Most frequent character per block

ValueCountFrequency (%)
0338592
> 99.9%
113
 
< 0.1%
713
 
< 0.1%

FukasyokinBefore1
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.86964843
Minimum0
Maximum310100
Zeros338398
Zeros (%)99.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum310100
Range310100
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2200.478899
Coefficient of variation (CV)123.1405815
Kurtosis19794.97525
Mean17.86964843
Median Absolute Deviation (MAD)0
Skewness140.505371
Sum6050520
Variance4842107.386
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
0338398
99.9%
357045
 
< 0.1%
378044
 
< 0.1%
385026
 
< 0.1%
31010017
 
< 0.1%
1330016
 
< 0.1%
364014
 
< 0.1%
308010
 
< 0.1%
23108
 
< 0.1%
28007
 
< 0.1%
ValueCountFrequency (%)
0338398
99.9%
23108
 
< 0.1%
27307
 
< 0.1%
28007
 
< 0.1%
308010
 
< 0.1%
ValueCountFrequency (%)
31010017
 
< 0.1%
1330016
 
< 0.1%
385026
< 0.1%
378044
< 0.1%
364014
 
< 0.1%

FukasyokinBefore2
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct22
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.749692846
Minimum0
Maximum88600
Zeros338214
Zeros (%)99.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum88600
Range88600
Interquartile range (IQR)0

Descriptive statistics

Standard deviation629.3501501
Coefficient of variation (CV)109.4580471
Kurtosis19713.76485
Mean5.749692846
Median Absolute Deviation (MAD)0
Skewness140.0771627
Sum1946800
Variance396081.6114
MonotocityNot monotonic
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
0338214
99.9%
102055
 
< 0.1%
108044
 
< 0.1%
104041
 
< 0.1%
110026
 
< 0.1%
106024
 
< 0.1%
8860017
 
< 0.1%
380016
 
< 0.1%
98015
 
< 0.1%
174015
 
< 0.1%
Other values (12)125
 
< 0.1%
ValueCountFrequency (%)
0338214
99.9%
6209
 
< 0.1%
6608
 
< 0.1%
7807
 
< 0.1%
8007
 
< 0.1%
ValueCountFrequency (%)
8860017
< 0.1%
380016
< 0.1%
174015
< 0.1%
168012
< 0.1%
162012
< 0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TrackCDBeforeCourseKubunCDHonsyokin1Honsyokin2Honsyokin3Honsyokin4Honsyokin5Honsyokin6HonsyokinBefore1HonsyokinBefore2HonsyokinBefore3HonsyokinBefore4HonsyokinBefore5Fukasyokin1Fukasyokin2Fukasyokin3Fukasyokin4Fukasyokin5FukasyokinBefore1FukasyokinBefore2
00A14800059000370002200014800000000357010205100000
10B1420005700036000210001420000000031509004500000
20A1480005900037000220001480000000021006003000000
30B17200069000430002600017200000000413011805900000
40A14800059000370002200014800000000357010205100000
50A10500042000260001600010500000000497014207100000
60A10500042000260001600010500000000406011605800000
70A7100028000180001100071000000000000000
80B14200057000360002100014200000000371010605300000
90B7100028000180001100071000000000000000

Last rows

TrackCDBeforeCourseKubunCDHonsyokin1Honsyokin2Honsyokin3Honsyokin4Honsyokin5Honsyokin6HonsyokinBefore1HonsyokinBefore2HonsyokinBefore3HonsyokinBefore4HonsyokinBefore5Fukasyokin1Fukasyokin2Fukasyokin3Fukasyokin4Fukasyokin5FukasyokinBefore1FukasyokinBefore2
3385820B14000056000350002100014000000000399011405700000
3385830B14000056000350002100014000000000399011405700000
3385840B510002000013000770051000000000000000
3385850B510002000013000770051000000000000000
3385860NaN7600030000190001100076000000000000000
3385870B7600030000190001100076000000000000000
3385880B2400009600060000360002400000000025907403700000
3385890B510002000013000770051000000000000000
3385900B510002000013000770051000000000000000
3385910B7000028000180001100070000000000000000

Duplicate rows

Most frequent

TrackCDBeforeCourseKubunCDHonsyokin1Honsyokin2Honsyokin3Honsyokin4Honsyokin5Honsyokin6HonsyokinBefore1HonsyokinBefore2HonsyokinBefore3HonsyokinBefore4HonsyokinBefore5Fukasyokin1Fukasyokin2Fukasyokin3Fukasyokin4Fukasyokin5FukasyokinBefore1FukasyokinBefore2count
110A50000200001300075005000000000000000017111
13600B50000200001300075005000000000000000012108
240A70000280001800011000700000000000000009670
23950C5000020000130007500500000000000000006041
13680B70000280001800011000700000000000000005427
380A75000300001900011000750000000000000004398
13800B75000300001900011000750000000000000003345
130A5100020000130007700510000000000000003063
24010C70000280001800011000700000000000000002873
29860D5000020000130007500500000000000000002011